Completeness and Reliability of Wikipedia Infoboxes in Various Languages
نویسنده
چکیده
Despite its popularity, Wikipedia is often criticized for poor information quality. Currently this online knowledge base consist over 45 million articles in almost 300 various languages. Articles in Wikipedia often includes special tables which present shortly important information about persons, places, products, organizations and other subjects. This table is usually placed in a visible part of the article and Wikipedia community called it „infobox”. These infoboxes contains information in a structured form that allows automatically enrich popular public databases such as DBpedia. Wikipedia users can edit infoboxes in different languages independently. So, quality of information about the same thing may differ between various language versions. This article will examine the completeness and reliability of infoboxes about different topics in seven language versions of Wikipedia: English, German, French, Polish, Russian, Ukrainian and Belarussian. The results of the study can be used for automatic assessing and improving the quality of information in Wikipedia as well as in other public knowledge bases.
منابع مشابه
Experiments with Wikipedia Cross-Language Data Fusion
There are currently Wikipedia editions in 264 different languages. Each of these editions contains infoboxes that provide structured data about the topic of the article in which an infobox is contained. The content of infoboxes about the same topic in different Wikipedia editions varies in completeness, coverage and quality. This paper examines the hypothesis that by extracting infobox data fro...
متن کاملWikiNet: A Very Large Scale Multi-Lingual Concept Network
This paper describes a multi-lingual concept network obtained automatically by mining for concepts and relations and exploiting a variety of sources of knowledge from Wikipedia. Concepts and their lexicalizations are extracted from Wikipedia pages. Relations are extracted from the category and page network, infoboxes and the body of the articles. The network consists of a central, language inde...
متن کاملTransfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, ...
متن کاملAutomatic Detection of Outdated Information in Wikipedia Infoboxes
An infobox of a Wikipedia article generally contains key facts in the article and is organized as attribute-value pairs. Infoboxes not only allow readers to rapidly gather the most important information about some aspects of the articles in which they appear, but also provide a source for many knowledge bases derived from Wikipedia. However, not all the values of infobox attributes are updated ...
متن کاملAcquiring Relational Patterns from Wikipedia: A Case Study
This paper proposes the automatic acquisition of binary relational patterns (i.e. portions of text expressing a relation between two entities) from Wikipedia. There are a few advantages behind the use of Wikipedia: (i) relations are represented in the DBpedia ontology, which provides a repository of concepts to be used as semantic variables within patterns; (ii) most of the DBpedia relations ap...
متن کامل